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Months ago I wrote a blog article on “How to write simple but sound Yara rules“. Since then
the mentioned techniques and tools have improved. I’d like to give you a brief update on
certain Yara features that I frequently use and tools that I use to generate and test my rules.

Handle Very Specific Strings Differently
In the past I was glad to see very specific strings in samples and sometimes used these
strings as the only indicator for detection. E.g. whenever I’ve found a certain typo in the PE
header fields like “Micorsoft Corportation” I cheered and thought that this would make a great
signature. But – and I have to admit that now – this only makes a nice signature. Great
signatures require not only to match on a certain sample in the most condensed way but
aims to match on similar samples created by the same author or group.
Look at the following rule:
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rule Enfal_Malware_Backdoor {
meta:
description = "Generic Rule to detect the Enfal Malware"
author = "Florian Roth"
date = "2015/02/10"
super_rule = 1
hash0 = "6d484daba3927fc0744b1bbd7981a56ebef95790"
hash1 = "d4071272cc1bf944e3867db299b3f5dce126f82b"
hash2 = "6c7c8b804cc76e2c208c6e3b6453cb134d01fa41"
score = 60
strings:
$x1 = "Micorsoft Corportation" fullword wide
$x2 = "IM Monnitor Service" fullword wide
$a1 = "imemonsvc.dll" fullword wide
$a2 = "iphlpsvc.tmp" fullword
$a3 = "{53A4988C-F91F-4054-9076-220AC5EC03F3}" fullword
$s1 = "urlmon" fullword
$s2 = "Registered trademarks and service marks are the property of their" wide
$s3 = "XpsUnregisterServer" fullword
$s4 = "XpsRegisterServer" fullword
condition:
uint16(0) == 0x5A4D and
(
( 1 of ($x*) ) or
( 2 of ($a*) and all of ($s*) )
)
}
What I do when I review the 20 strings that are generated by yarGen is that I try to
categorize the extracted strings in 3 different groups:
Very specific strings (one of them is sufficient for successful detection, e.g. IP
addresses, payload URLs, PDB paths, user profile directories)
Specific strings (strings that look good but may appear in goodware as well, e.g.
“wwwlib.dll”)
Other strings (even strings that appear in goodware; without random code from
compressed or encrypted data; e.g. “ModuleStart”)
Then I create a condition that defines:
A Certain Magic Header (remove it in case of ASCII text like scripts or webshells)
1 of the very specific strings OR
some of the specific strings combined with many (but not all) of the common strings
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Here is another example that does only have very specific strings (x) and common strings
(s):
rule Cobra_Trojan_Stage1 {
meta:
description = "Cobra Trojan - Stage 1"
author = "Florian Roth"
reference = "https://blog.gdatasoftware.com/blog/article/analysis-of-project-cobra.html"
date = "2015/02/18"
hash = "a28164de29e51f154be12d163ce5818fceb69233"
strings:
$x1 = "KmSvc.DLL" fullword wide
$x2 = "SVCHostServiceDll_W2K3.dll" fullword ascii
$s1 = "Microsoft Corporation. All rights reserved." fullword wide
$s2 = "srservice" fullword wide
$s3 = "Key Management Service" fullword wide
$s4 = "msimghlp.dll" fullword wide
$s5 = "_ServiceCtrlHandler@16" fullword ascii
$s6 = "ModuleStart" fullword ascii
$s7 = "ModuleStop" fullword ascii
$s8 = "5.2.3790.3959 (srv03.sp2.070216-1710)" fullword wide
condition:
uint16(0) == 0x5A4D and filesize < 50000 and 1 of ($x*) and 6 of ($s*)
}
If you can’t create a rule that is sufficiently specific, I recommend the following methods to
restrict the rule:
Magic Header (use it as first element in condition – see performance guidelines, e.g.
“uint16(0) == 0x5A4D”)
File Size (malware that mimics valid system files, drivers or legitimate software often
differs significantly in size; try to find the valid files online and set a size value in your
rule, e.g. “filesize > 200KB and filesize < 600KB")
String Location (see the “Location is Everything” section)
Exclude strings that occur in false positives (e.g. $fp1 = “McAfeeSig”)

Location is Everything
One of the most underestimated features of Yara is the possibility to define a range in which
strings occur in order to match. I used this technique to create a rule that detect metasploit
meterpreter payloads quite reliably even if it’s encoded/cloaked. How that?
If you see malware code that is hidden in an overlay at the end of a valid executable (e.g.
“ab.exe”) and you see only strings that are typical function exports or mimics a well-known
executable ask the following questions:
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Is it normal that these strings are located at this location in the file?
Is it normal that these strings occur more than once in that file?
Is the distance between two strings somehow specific?

Malware Strings

In case of the unspecific malware code in the PE overlay, try to define a rule that looks for a
certain file size (e.g. filesize > 800KB) and the malware strings relative to the end of the file
(e.g. $s1 in (filesize-500..filesize)).
The following example shows a unspecified webshell that contains strings that may be
modified by an attacker in future versions when applied in a victim’s network. Try always to
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extract strings that are less likely to be changed.

Webshell Code PHP

The variable name “$code” is more likely to change than the function combination
“@eval(gzinflate(base64_decode(” at the end of the file. It is possible that valid php code
contains “eval(gzinflate(base64_decode(” somewhere in the code but it is less likely that it
occurs in the last 50 bytes of the file.
I therefore wrote the following rule:
rule Webshell_b374k_related_1 {
meta:
description = "Detects b374k related webshell"
author = "Florian Roth"
reference = "https://goo.gl/ZuzV2S"
score = 65
hash = "d5696b32d32177cf70eaaa5a28d1c5823526d87e20d3c62b747517c6d41656f7"
date = "2015-10-17"
strings:
$m1 = "<?php"
$s1 = "@eval(gzinflate(base64_decode(" ascii
condition:
$m1 at 0 and $s1 in (filesize-50..filesize) and filesize < 20KB
}

Performance Guidelines
I collected many ideas by Wesley Shields and Victor M. Alvarez and composed a gist called
“Yara
Performance Guidelines”. This guide shows you how to write Yara rules that use less CPU
cycles by avoiding CPU intensive checks or using new condition checking shortcuts
introduced in Yara version 3.4.
Yara Performance Guidelines

PE Module
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People sometimes ask why I don’t use the PE module. The reason is simple: I avoid using
modules that are rather new and would like to see it thoroughly tested prior using it in my
scanners running in productive environments. It is a great module and a lot of effort went into
it. I would always recommend using the PE module in lab environments or sandboxes. In
scanners that walk huge directory trees a minor memory leak in one of the modules could
lead to severe memory shortages. I’ll give it another year to prove its stability and then start
using it in my rules.

yarGen
yarGen has an opcode feature since the last minor version. It is active by default but only
useful in cases in which not enough strings could be extracted.
I currently use the following parameters to create my rules:
python yarGen.py --noop -z 0 -a "Florian Roth" -r "http://link-to-sample" /mal/malware
The problem with the opcode feature is that it requires about 2,5 GB more main memory
during rule creation. I’ll change it to an optional parameter in the next version.

yarAnalyzer
yarAnalyzer is a rather new tool that focuses on rule coverage. After creating a bigger rule
set or a generic rule that should match on several samples you’d like to check the coverage
of your rules in order to detect overlapping rules (which is often OK).
yarAnalyzer helps you to get an overview on:
rules that match on more than one sample
samples that show hits from more than one rule
rules without hits
samples without hits
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yarAnalayzer Screenshot

yarAnalyzer Github Repository

String Extraction and Colorization
To review the strings in a sample I use a simple shell one-liner that a good friend sent me
once.
“strings” version for Linux
#!/bin/bash
(strings -a -td "$@" | sed 's/^\(\s*[0-9][0-9]*\) \(.*\)$/\1 A \2/' ; strings -a -td -el "$@" | sed 's/^\
(\s*[0-9][0-9]*\) \(.*\)$/\1 W \2/') | sort -n
“gstrings” version for OS X (sudo port install binutils)
#!/bin/bash
(gstrings -a -td "$@" | gsed 's/^\(\s*[0-9][0-9]*\) \(.*\)$/\1 A \2/' ; gstrings -a -td -el "$@" | gsed
's/^\(\s*[0-9][0-9]*\) \(.*\)$/\1 W \2/') | sort -n
It produces an output as shown in the above screenshot with green text and the description
“Malware Strings” showing the offset, ascii (A) or wide (W) and the string at this offset.
For a colorization of the string check my new tool “prisma” that colorizes random type
standard output.
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Prisma STDOUT colorization

Contact
Follow me on Twitter: @Cyb3rOps
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